Search CORE

43 research outputs found

The data cyclotron : juggling data and queries for a data warehouse audience

Author: Pereira Goncalves R.A. (Romulo Antonio)
Publication venue
Publication date: 22/03/2013
Field of study

The data cyclotron query processing scheme

Author: Kersten M.L. (Martin)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue: 'MIT Press - Journals'
Publication date: 01/03/2010
Field of study

Distributed database systems exploit static workload characteristics to steer data fragmentation and data allocation schemes. However, the grand challenge of distributed query processing is to come up with a self-organizing architecture, which exploits all resources to manage the hot data set, minimize query response time, and maximize throughput without global co-ordination. In this paper, we introduce the Data Cyclotron architecture which addresses the challenges using turbulent data movement through a storage ring built from distributed main memory capitalizing modern remote-DMA facilities. Queries assigned to individual nodes interact with the Data Cyclotron by picking up data fragments continuously flowing around, i.e., the hot set. Each data fragment carries a level of interest (LOI) metric, which represents the cumulative query interest as the fragment passes around the ring multiple times. A fragment with a LOI below a given threshold, inversely proportional to the ring load, is pulled o

CWI's Institutional Repository

Peak Performance – Remote Memory Revisited

Author: Kersten M.L. (Martin)
Mühleisen H.F. (Hannes)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue
Publication date: 01/07/2013
Field of study

Many database systems share a need for large amounts of fast storage. However, economies of scale limit the utility of extending a single machine with an arbitrary amount of memory. The recent broad availability of the zero-copy data transfer protocol RDMA over low-latency and high throughput network connections such as InfiniBand prompts us to revisit the long-proposed usage of memory provided by remote machines. In this paper, we present a solution to make use of remote memory without manipulation of the operating system, and investigate the impact on database performance

CWI's Institutional Repository

Exploiting the Power of Relational Databases for Efficient Stream Processing

Author: Idreos S. (Stratos)
Liarou E. (Erietta)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2009
Field of study

Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research. In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after

x

tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach

CWI's Institutional Repository

An architecture for recycling intermediates in a column-store

Author: Ivanova M.G. (Milena)
Kersten M.L. (Martin)
Nes N.J. (Niels)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/06/2009
Field of study

Automatically recycling (intermediate) results is a grand challenge for state-of-the-art databases to improve both query response time and throughput. Tuples are loaded and streamed through a tuple-at-a-time processing pipeline avoiding materialization of intermediates as much as possible. This limits the opportunities for reuse of overlapping computations to DBA-defined materialized views and function/result cache tuning. In contrast, the operator-at-a-time execution paradigm produces fully materialized results in each step of the query plan. To avoid resource contention, these intermediates are evicted as soon as possible. In this paper we study an architecture that harvests the by-products of the operator-at-a-time paradigm in a column store system using a lightweight mechanism, the recycler. The key challenge then becomes selection of the policies to admit intermediates to the resource pool, their retention period, and the eviction strategy when facing resource limitations. The proposed recycling architecture has been implemented in an open-source system. An experimental analysis against the TPC-H ad-hoc decision support benchmark and a complex, real-world application (SkyServer) demonstrates its effectiveness in terms of self-organizing behavior and its significant performance gains. The results indicate the potentials of recycling intermediates and charters a route for further development of database kernels

CWI's Institutional Repository

An architecture for recycling intermediates in a column-store

Author: Ivanova M.G. (Milena)
Kersten M.L. (Martin)
Nes N.J. (Niels)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue: A.C.M.
Publication date: 01/12/2010
Field of study

Automatic recycling intermediate results to improve both query response time and throughput is a grand c

CWI's Institutional Repository

Spinning Relations: High-Speed Networks for Distributed Join Processing

Author: Frey P.W. (Philip)
Kersten M.L. (Martin)
Pereira Goncalves R.A. (Romulo Antonio)
Teubner J. (Jens)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2009
Field of study

By leveraging modern networking hardware (RDMA-enabled network cards), we can shift priorities in distributed database processing significantly. Complex and sophisticated mechanisms to avoid network traffic can be replaced by a scheme that takes advantag

CWI's Institutional Repository

A column-store meets the point clouds

Author: Ivanova M.G. (Milena)
Kersten M.L. (Martin)
Martinez-Rubi O. (Oscar)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue
Publication date: 01/07/2014
Field of study

Dealing with LIDAR data in the context of database management systems calls for a re-assessment of their functionality, performance, and storage/processing limitations. The territory for efficient and scalable processing of LIDAR repositories using GIS enabled database systems is still largely unexplored. Bringing together hard core database management experts and GIS application developers is a sine qua non to advance the state of the art. In particular to assess the relative merits of both traditional row-based database engines and the modern column-oriented database engines

CWI's Institutional Repository

GIS Navigation Boosted by Column Stores

Author: Alvanaki F. (Foteini)
Ivanova M.G. (Milena)
Kersten M.L. (Martin)
Kyzirakos K. (Konstantinos)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2015
Field of study

CWI's Institutional Repository

Column-store support for RDF data management: not all swans are white

Author: Kersten M.L. (Martin)
Manegold S. (Stefan)
Nes N.J. (Niels)
Pereira Goncalves R.A. (Romulo Antonio)
Sidirourgos E. (Eleftherios)
Publication venue
Publication date: 01/09/2008
Field of study

This paper reports on the results of an independent evaluation of the techniques presented in the VLDB 2007 paper "Scalable Semantic Web Data Management Using Vertical Partitioning", authored by D. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. We revisit the proposed benchmark and examine both the data and query space coverage. The benchmark is extended to cover a larger portion of the query space in a canonical way. Repeatability of the experiments is assessed using the code base obtained from the authors. Inspired by the proposed vertically-partitioned storage solution for RDF data and the performance figures using a column-store, we conduct a complementary analy- sis of state-of-the-art RDF storage solutions. To this end, we employ MonetDB/SQL, a fully-functional open source column-store, and a well-known --- for its performance --- commercial row-store DBMS.We implement two relational RDF storage solutions – triple-store and vertically-partitioned --- in both systems. This allows us to expand the scope of with the performance characterization along both dimensions --- triple-store vs. vertically-partitioned and row-store vs. column-store --- individually, before analyzing their combined effects. A detailed report of the experimental test-bed, as well as an in-depth analysis of the parameters involved, clarify the scope of the solution originally presented and position the results in a broader context by covering more systems

CWI's Institutional Repository